feat: add support for GetTimestamp, parse_to_date expressions#3467
feat: add support for GetTimestamp, parse_to_date expressions#3467rafafrdz wants to merge 16 commits intoapache:mainfrom
GetTimestamp, parse_to_date expressions#3467Conversation
22f22c8 to
a19c641
Compare
spark/src/test/resources/sql-tests/expressions/datetime/to_date.sql
Outdated
Show resolved
Hide resolved
8033aea to
3c087e1
Compare
support CometParseToDate expression
3460bcb to
1ace0e2
Compare
…rts and style issues
3d20bd5 to
294c749
Compare
andygrove
left a comment
There was a problem hiding this comment.
I have a few concerns about this PR.
The micros-to-days conversion in spark_to_date uses (micros / 1_000_000 / 86_400) as i32. Integer division in Rust truncates toward zero, which gives wrong results for pre-epoch dates with non-midnight times. For example, a timestamp like "1969-12-31 12:00:00" would produce micros = -43200000000. Dividing by 1_000_000 gives -43200, then dividing by 86400 gives 0 (day 0 = 1970-01-01), when the correct answer is day -1 (1969-12-31). You would need floor division here, something like micros.div_euclid(1_000_000).div_euclid(86_400) as i32.
The spark_to_chrono format converter only handles a small subset of Spark's SimpleDateFormat patterns (yyyy, MM, dd, HH, mm, ss, fractions, XXX, Z). Patterns like M (single-digit month), d (single-digit day), yy (2-digit year), a (AM/PM), or E (day of week) would pass through unconverted and silently produce wrong results. It might be worth either validating that the format string only contains supported tokens and falling back to Spark for unsupported ones, or documenting the supported subset in getSupportLevel.
Speaking of getSupportLevel, CometParseToDate doesn't override it, so it defaults to Compatible(). It probably needs similar type checks to CometGetTimestamp for TimestampNTZType and other unsupported input types.
- Fix pre-epoch date conversion using div_euclid (floor division) so timestamps like "1969-12-31 12:00:00" correctly map to day -1 instead of day 0 - Add format validation in spark_to_chrono to reject unsupported Spark SimpleDateFormat tokens (e.g. single-digit month, 2-digit year, AM/PM) instead of silently producing wrong results - Add getSupportLevel to CometParseToDate with type checks for TimestampNTZType and other unsupported input types - Add Rust unit tests for pre-epoch dates, unsupported formats, and format validation - Add SQL file tests and Scala tests for pre-epoch date handling
Summary
GetTimestampexpression, which allows the followings,ParseToDateexpression, and thereforeto_datefunction ( Closes [Feature] Support Spark expression: parse_to_date #3093 )In order to add support
to_datespark sql function, we need first to coverGetTimestampas an intermediate expression within the query plan when call eitherParseToDate(to_datefunction).Note to maintainers
Since this is my first contribution to Apache DataFusion Comet, I may have missed some steps in the development or contribution process. Please let me know if anything is missing or needs adjustment.